Mapping a single-assignment language onto the Warp systolic array
نویسندگان
چکیده
Single-assignment languages offer the potential to efficiently program parallel processors. This paper discusses issues that arise in mapping SISAL programs onto the Warp TM array, a linear systolic array in use at Carnegie Mellon. A Warp machine with ten cells can deliver up to 100 million floating point operations per second. The paper begins with a discussion of systolic arrays as targets for singie-assignment languages and the suitability of the Warp machine for this purpose. Systolic arrays can take advantage of both large-grain parallelism and fine-grain parallelism. The communication bandwidth of the systolic array gives the translator great flexibility in mapping a SISAL program onto the linear array. We present two principal methods to exploit parallelism on Warp, data partitioning and pipelining. Data partitioning is effective for local computations that depend on only a small neighborhood of values. Since SISAL allows the specification of array sizes at run-time, we have to provide static and dynamic methods for data partitioning. Many operations on the SISAL stream data type can be parallelized as a special case of dynamic data partitioning. Pipelining allows the overlapping of different stages of a computation or of function invocations. This method is well suited for Warp since the systolic array has high inter-cell communication bandwidth. This haakes it possible to send large data sets to the next processor in a computation pipeline without performance degradation. We use matrix multiplication and a relaxation algorithm, respectively, as examples to illustrate the data partitioning and pipeline models for mapping SISAL programs onto the Warp array. 1. Introduct ion Single-assignment languages offer an elegant way to program parallel computers. There is no need for the compiler to "extract" parallelism, and users do not get involved in the explicit management of para]lelism in a program. The challenge for the compiler writer and computer architect is to devise an efficient architecture that can exploit this implicit parallelism in practice. To date, there have been two major thrusts toward implementing single-assignment languages. Since singleassignment languages like VAL or SISAL are geared towards execution in a graph-oriented processing environment, some researchers have concentrated on building hardware that directly interprets a program graph. A program is translated into a graph representation; the nodes in this graph represent operations (or function invocations), and the arcs specify data dependencies between the nodes. Such an architecture is capable of exploiting fine-grain parallelism since there is the potential for a large number of elementary nodes to be
منابع مشابه
Path planning on the Warp computer : using a linear systolic array in dynamic programming
Given a map in which each position is associated with a travcrsability cost, the path planning problem is to find a minimum-cost path from a source position to every other position in the map. The paper proposes a dynamic programming algorithm to solve the problem, and analyzes the exact number of operations that the algorithm takes. The algorithm accesses the map in a highly regular way, so it...
متن کاملThe Warp Computer: Architechture, Implementation, and Performance
The Warp machine is a systolic array computer of linearly connected cells, each of which is a programmable processor capable of performing 10 million floating-point operations per second (10 MFLOPS). A typical Warp array includes 10 cells, thus having a peak computation rate of 100 h4FLOPS. The Warp array can be extended to include more cells to accommodate applications capable of using the inc...
متن کاملCMU - CS - 84 - 158 Systolic Algorithms for the CMU Warp Processor
CMU is building a 32-bit floating-point systolic array that can efficiently perform many essential computations in signal processing like the FFT and convolution. This is a one-dimensional systolic array that in general takes inputs from one end cell and produces outputs at the other end, with data and control all flowing in one direction. We call this particular systolic array the Warp process...
متن کاملRSA Acceleration with Field ProgrammableGate
An eecient implementations of modular exponentiation, i.e., the main building block in the RSA cryptographic scheme, is achieved by rst designing a bit-level systolic array such that the whole procedure of modular exponentiation can be carried out entirely by a single unit without using global interconnections or memory to store intermediate results, and then mapping this design onto Xilinx XC6...
متن کاملSystolic algorithms for the CMU warp processor
CMU is building a 32-bit floating-point systolic array that can cfficicndy perform many essential computations in signal processing like the FFT and convolution. This is a one-dimensional systolic array that in general takes inputs from one end cell and produces outputs at the other end, with data and control all flowing in one direction. We call this particular systolic array the Warp processo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1987